An Analysis of Direct Reinforcement Learning in Non-Markovian Domains

نویسندگان

Mark D. Pendrith

Michael McGarity

چکیده

It is well known that for Markov decision processes, the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of non-Markov decision processes, if actual return (Monte Carlo) credit assignment is used with undiscounted returns, we are still guaranteed the optimal observation-based policies will be equilibrium points in the policy space when using the standard “direct” reinforcement learning approaches. However, if either discounted rewards, or a temporal differences style of credit assignment method is used, this is not the case.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dissertation an Echo State Model of Non-markovian Reinforcement Learning

OF DISSERTATION AN ECHO STATE MODEL OF NON-MARKOVIAN REINFORCEMENT LEARNING There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to doma...

متن کامل

Description and Acquirement of Macro-Actions in Reinforcement Learning

Reinforcement learning is a framing of enabling agents to learn from interaction with environments. It has focused generally on Markov decision process (MDP) domains, but a domain may be non-Markovian in the real world. In this paper, we develop a new description of macro-actions for non-Markov decision process (NMDP) domains in reinforcement learning. A macro-action is an action control struct...

متن کامل

Reinforcement Learning through Global Stochastic Search in N-MDPs

Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a wa...

متن کامل

On using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains

We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of CohenGrossberg node dynamics describes the node-activation computations of a neural network (NN). Our NN (i.e., agent) is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning...

متن کامل

C-trace: a New Algorithm for Reinforcement Learning of Robotic Control

There has been much recent interest in the potential of using reinforcement learning techniques for control in autonomous robotic agents. How to implement eeective reinforcement learning in a real-world robotic environment still involves many open questions. Are standard reinforcement learning algorithms like Watkins' Q-learning appropriate , or are other approaches more suit-able? Some speciic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

An Analysis of Direct Reinforcement Learning in Non-Markovian Domains

نویسندگان

چکیده

منابع مشابه

Dissertation an Echo State Model of Non-markovian Reinforcement Learning

Description and Acquirement of Macro-Actions in Reinforcement Learning

Reinforcement Learning through Global Stochastic Search in N-MDPs

On using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains

C-trace: a New Algorithm for Reinforcement Learning of Robotic Control

عنوان ژورنال:

اشتراک گذاری